Applied Bayesian Analyses in R

Part 3: a mixed effects example

Sven De Maeyer

Model comparison

Leave-one-out cross-validation

Key idea:

  • leave one data point out of the data

  • re-fit the model

  • predict the value for that one data point and compare with observed value

  • re-do this n times

loo package

Leave-one-out as described is almost impossible!

loo uses a “shortcut making use of the mathematics of Bayesian inference” 1

Result: (\(\widehat{elpd}\)): “expected log predictive density” (higher (\(\widehat{elpd}\)) implies better model fit without being sensitive for over-fitting!)

loo code

loo_Mod1 <- loo(MarathonTimes_Mod1)
loo_Mod2 <- loo(MarathonTimes_Mod2)

Comparison<- 
  loo_compare(
    loo_Mod1, 
    loo_Mod2
    )

print(Comparison, simplify = F)

#or

print(Comparison)

loo code

Bayesian Mixed Effects Model

New example WritingData.RData

  • Experimental study on Writing instructions

  • 2 conditions:

    • Control condition (Business as usual)
    • Experimental condition (Observational learning)

Potential models

Model 1: Intercept varies between classes j + overall effect of FirstVersion

Frequentist way of writing…

Bayesian way of writing…

Potential models

Model 2: Intercept and effect of First_Version varies between classes j (random slopes)

Frequentist way of writing…

Bayesian way of writing…

or

Potential models

Model 3: M2 + Effect of condition

Your Turn

  • Open WritingData.RData

  • Estimate 3 models with SecondVersion as dependent variable

    • M1: fixed effect of FirstVersion_GM + random effect of Class ((1|Class))
    • M2: M1 + random effect of FirstVersion_GM ((1 + FirstVersion_GM |Class))
    • M3: M2 + fixed effect of Experimental_condition
  • Compare the models on their fit

  • What do we learn?

  • Make a summary of the best fitting model

Divergent transitions…

  • Something to worry about!

  • Essentially: sampling of parameter estimate values went wrong

  • Fixes:

    • sometimes fine-tuning the sampling algorithm (e.g., control = list(adapt_delta = 0.9)) works
    • sometimes you need more informative priors
    • sometimes the model is just not a good model

Let’s re-consider the priors

Default brms priors

The priors of M3

Full model specification

Understanding LKJcor prior

Understanding the priors for the “fixed effects”

The overall intercept

Maybe a little wide?? But, let’s stick with it…

Understanding the priors for the “fixed effects”

  • What about the overall effect of FirstVersion_GM?

  • What about the overall effect of ExperimentalCondition?

Nice to know:

sd(WritingData$SecondVersion)
[1] 16.75928

Understanding the priors for the “fixed effects”

What about the overall effect of FirstVersion_GM?



# Normal distribution with mean = 1 and sd = 5
Prior_beta1 <- ggplot( ) +
  stat_function(
    fun = dnorm,    # We use the normal distribution
    args = list(mean = 1, sd = 5), # 
    xlim = c(-9,11)
  ) +
  scale_y_continuous(name = "density") +
  labs(title = "Prior for the effect of FirstVersion",
       subtitle = "N(1,5)") +
  geom_vline(xintercept=1,linetype=3)

Prior_beta1 

Understanding the priors for the “fixed effects”

What about the overall effect of ExperimentalCondtion?

Assuming a “small effect size” \(=\) 0.2 SD’s \(=\) Effect of 3.4 points on our scale

We also want to give probability to big effects as well as negative effects, so we set the standard deviation of our prior distribution wide enough (e.g., 17).

# Normal distribution with mean = 3.4 and sd = 17
Prior_beta2 <- ggplot( ) +
  stat_function(
    fun = dnorm,    # We use the normal distribution
    args = list(mean = 3.4, sd = 17), # 
    xlim = c(-30.6,37.4)
  ) +
  scale_y_continuous(name = "density") +
  labs(title = "Prior for the effect of FirstVersion",
       subtitle = "N(3.4,17)") +
  geom_vline(xintercept=3.4,linetype=3)

Prior_beta2 

Understanding the variances between Classes!

Variance between classes for the intercept (expressed as an SD)?

How large are differences between classes for an average student (for first version)?

So, a high probability for SD’s bigger than 13.3, what does that mean?

Understanding the variances between Classes!

Imagine an SD of 13.3 for the differences between classes’ intercepts

95% of classes will score between 83.8 and 137

Distribution alert!! This is not a probability distribution for a parameter but a distribution of hypothetical observations (classes)!!

Understanding the variances between Classes!

Imagine an SD of 20 for the differences between classes’ intercepts

95% of classes will score between 70.4 and 150.4

Distribution alert!! This is not a probability distribution for a parameter but a distribution of hypothetical observations (classes)!!

Understanding the variances between Classes!

Variance between classes for the effect of FirstVersion (expressed as an SD)?

How large are differences between classes for the effect of first version?

So, a high probability for SD’s bigger than 13.3, what does that mean?

Understanding the variances between Classes!

Imagine an SD of 13.3 for the differences in slopes between classes

For 95% of classes the effect of first version will vary between -25.6 and 27.6!

Distribution alert!! This is not a probability distribution for a parameter but a distribution of hypothetical observations (classes)!!

Model 3 with custom priors

Set custom priors in brms

Custom_priors <- 
  c(
    set_prior(
      "normal(1,5)", 
      class = "b", 
      coef = "FirstVersion_GM"),
    set_prior(
      "normal(3.4,17)", 
      class = "b", 
      coef = "Experimental_condition"),
    set_prior(
      "student_t(3,0,5)", 
      class = "sd", 
      coef = "FirstVersion_GM",
      group = "Class")
  )

Estimate the model with custom priors

M3_custom <- brm(
  SecondVersion ~ FirstVersion_GM + Experimental_condition + (1 + FirstVersion_GM |Class),
  data = WritingData,
  prior = Custom_priors,
  backend = "cmdstanr",
  cores = 4,
  seed = 1975
)

Prior Predictive Check

Put priors in the context of the likelihood


Did you set sensible priors? How informative are the priors (taken together)?


  • Simulate data based on the model and the priors


  • Visualize the simulated data and compare with real data


  • Check if the plot shows impossible simulated datasets

In brms


Step 1: Fit the model with custom priors with option sample_prior="only"


M3_custom_priorsOnly <- 
  brm(
    SecondVersion ~ FirstVersion_GM + Experimental_condition + (1 + FirstVersion_GM |Class),
    data = WritingData,
    prior = Custom_priors,
    backend = "cmdstanr",
    cores = 4,
    seed = 1975, 
    sample_prior = "only"
)

Note

This will not work with the default priors of brms as brmssets flat priors for beta’s. So, you have to set custom priors for the effects of independent variables to be able to perform the prior predictive checks.

In brms


Step 2: visualize the data with the pp_check( ) function


set.seed(1975)

pp_check(
  M3_custom_priorsOnly, 
  ndraws = 300) # number of simulated datasets you wish for

In brms

300 distributions of simulated datasets (scores on SecondVersion) overlayed by our observed data (to have an anchoring point)

Check some summary statistics

  • How are summary statistics of simulated datasets (e.g., median, min, max, …) distributed over the datasets?

  • How does that compare to our real data?

  • Use type = "stat" argument within pp_check()

pp_check(M3_custom_priorsOnly, 
         type = "stat", 
         stat = "median")

Check some summary statistics

300 medians of simulated datasets (scores on SecondVersion) overlayed by the median of our observed data (to have an anchoring point)

Check some summary statistics

  • Let’s check the minima of all simulated datasets based on our priors
pp_check(M3_custom_priorsOnly, 
         type = "stat", 
         stat = "min")

Check some summary statistics

300 lowest scores (in the simulated datasets) overlayed by the lowest observed score in our data (to have an anchoring point)

Check our grouped data (we have classes)

For each datapoint, situated within it’s group (Class) a set of 300 predicted scores based on the prior only.

pp_check(
  M3_custom_priorsOnly ,
  type = "intervals_grouped",
  group = "Class",
  ndraws = 300)

Check our grouped data (we have classes)

Possible conclusion

Put priors in the context of the likelihood

The priors



are non-informative!! They will not predict our data perfectly.

Prior sensitivity analyses

Why prior sensitivity analyses?

  • Often we rely on ‘arbitrary’ chosen (default) weakly informative priors

  • What is the influence of the prior (and the likelihood) on our results?

  • You could ad hoc set new priors and re-run the analyses and compare (a lot of work, without strict sytematical guidelines)

  • Semi-automated checks can be done with priorsense package

Using the priorsense package

Recently a package dedicated to prior sensitivity analyses is launched

# install.packages("remotes")
remotes::install_github("n-kall/priorsense")

Key-idea: power-scaling (both prior and likelihood)

background reading:

YouTube talk:

Basic table with indices

First check is done by using the powerscale_sensitivity( ) function

  • column prior contains info on sensitivity for prior (should be lower than 0.05)

  • column likelihood contains info on sensitivity for likelihood (that we want to be high, ‘let our data speak’)

  • column diagnosis is a verbalization of potential problem (- if none)

powerscale_sensitivity(M3_custom)

Basic table with indices

Sensitivity based on cjs_dist:
# A tibble: 47 × 4
   variable                                 prior likelihood diagnosis
   <chr>                                    <dbl>      <dbl> <chr>    
 1 b_Intercept                           0.00282      0.0448 -        
 2 b_FirstVersion_GM                     0.00201      0.0152 -        
 3 b_Experimental_condition              0.00170      0.0229 -        
 4 sd_Class__Intercept                   0.00549      0.169  -        
 5 sd_Class__FirstVersion_GM             0.00411      0.0296 -        
 6 cor_Class__Intercept__FirstVersion_GM 0.000935     0.171  -        
 7 sigma                                 0.00199      0.301  -        
 8 r_Class[1,Intercept]                  0.00116      0.0803 -        
 9 r_Class[2,Intercept]                  0.00114      0.133  -        
10 r_Class[3,Intercept]                  0.00167      0.123  -        
# ℹ 37 more rows

Visualization of prior sensitivity

powerscale_plot_dens(
  powerscale_sequence(
    M3_custom
    ),
  variables = c(
      "b_Intercept",
      "b_FirstVersion_GM",
      "b_Experimental_condition"
    )
  )

Visualization of prior sensitivity

Visualization of prior sensitivity

powerscale_plot_quantities(
  powerscale_sequence(
    M3_custom
    ),
  variables = c(
      "b_Experimental_condition"
      )
  )

Visualization of prior sensitivity